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ABSTRACT 


One of the goals of computer technology is to have the ability to 
communicate with the computer in a natural language such as English. A 
research effort underway at the Naval Postgraduate School involves the 
design and implementation of a computer system for translating natural 
language descriptions of simulation problems into executable computer 
programs. In this system, English text is translated into an internal 
data structure which is then translated into a computer program for 
performing the simulation. 

This thesis reports on an effort made to aid the user of this system 
by (1) extending the capabilities of an existing procedure for translating 

the internal data structure into a GPSS simulation program, and (2) 
Garonne a procedure for translating the data structure into English 
text so the user could see that his input text had been correctly inter- 
preted. The basic operation of the system is described and examples ar? 


given to illustrate the system's capabilities. 
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I. INTRODUCTION 


One of the goals of computer technology is to have the ability to 
communicate with the computer in a natural language such as English. 
Ideally, the user would not need to know anything about computer hardware 
or software or programming languages. He would explain the work to be 
done or ask questions of the computer and supply input data, all ina 
natural language. The computer would then perform the necessary work and 
supply the answer in a natura | language format. To date, little of this 
type of work with natural language input to a computer has produced any-: 
thing of practical value. 

In early attempts to develop a natural language processor, brute force 
technigues were tricd and, for the most part, discarded as both unwieldy 
and unworkable. Current attempts revolve around the development of language 
theories. Probably the most widely known natural language theory is that 
of transformational grammar by Noam Chomsky [1]. Another is that of strati- 
ficational grammar by Sydney Lamb [2-4]. A summary of current efforts in 
natural language computer processing may be found in Refs. 5 and 6. 

A natural language processor (NLP) currently being developed at the 
Naval Postgraduate School [7] uses Lamb's theory of a stratified grammar 
in processing natural language. The immediate goal of NLP 1s the trans- 
lation of a natural language expression of a simulation problem into a 
computer program written in a Sailattfon wraaeernnne language. By 
initially limiting the scope of the natural language input to a subset 
of English, an attempt is being made to produce a system of some immediate 
practical value, a system which can be used by persons who wish to solve 


Simulation froblems without learning the intricacies of a computer 





Simulation language. The long range goal of NLP is the translation of 
any input lanaquage to any output language. The method of processing 
involves the decoding of the input language text into an internal data 
structure (IDS) and then the encoding of the IDS into an output language 
text, all under the control of a FORTRAN program, guided by appropriate 
sets of decoding and encoding "rules." 

The objective of the research being reported on in this thesis was to 
aid the user of NLP by (1) improving the appearance of the output simula- 
tion program, currently in GPSS, and (2) making it possible to translate 
the IDS into English so the user could see if his problem had been stated 
correctly. The first area involved expanding on and testing work done by 
LCDR Richard ne [8] in translating the IDS into GPSS programs. The 


second area involved developing a system to translate the IDS into an 


This thesis begins by presenting a brief explanation of NLP as it 
relates to the achievement of the above objectives. The following two 
sections are then devoted to a description of the systems developed, and 
the final section presents conclusions and recommendations for future 
work. Reference 8 contains a detailed discussion of NLP and the develop- 
ment of the iDS-GPSS ern system. Familiarity with the material 


presented there is necessary for a- thorough understanding of this thesis. 


| 





ITI. THE OPERATION OF NLP 


In order to introduce the reader to the procedure for translating the 
IDS to GPSS, English or any other language, this section will present the 
overall operations of NLP as they apply to this thesis. Discussion of the 
natural-language-to-IDS operation of NLP will be limited to that necessary 


to understand the development of the IDS. 


A. BASIC OPERATION 

The basic operation of NLP is extremely straightforward; the input 
language is decoded into the IDS, and then the IDS is encoded into the 
output language. The main idea behind the operation of NLP, and therefore 
the underlying reason for the form of the IDS, is that of extracting the 
meaning from tne input language. The IDS is structured to store inose 
elements which "contain" the meaning. A vital point in viewing the trans- 
lation process is that the input text, the IDS, and the output text are 
merely three roughly equivalent representations of one meaning. An example 
may Clarify the concept of extracting the meaning. Under the decoding 
process, the sentences "John hit Mary" and "Mary was hit by John" would 


result in exactly the same IDS: 


Sup -- hit 
agent -- John 
goal -- Mary 


The meaning is then available in the IDS for encoding into the desired 


output language. 








Pee einer oTRUGHURE OF IDS 


The major conceptual building block of the IDS is the "record." A 


It 


record represents an "entity," where the entity may be a complex sentence 
or a simple noun. The flexible structure of the record allows it to 
expand and contract as necessary to contain the distinguishing attribute 
values of the entity. A simplified record for "Customers arrive at the 


bank every 10 minutes," would have the following attribute-value pairs: 


Attribute Value 
SUP (entity type) arrive 
AGENT customers 
GOAL 
LOCATION. bank 
IETM (inter-event time) 10 minutes 
The actuai form of this recora in the IDS would be considerably more 


cryptic than is indicated above in order to conserve space in the IDS. 
Special types of attributes called indicators are employed by NLP in 
Situations where the value of the attribute may be represented by a zero 
or one, and attributes such as "goal" would actually be a number in the 
IDS. A more detailed description of the computer representation of an 
IDS record i. given in Ref. 8. 

A special type of record which will be referred to throughout the 
thesis is the SEGMENT. A SEGMENT is a record which represents the infor- 
mation in that part of the IDS currently being processed. As a record, 

a SEGMENT may range from simple to extremely complex and may possess any 
number of attributes. A SEGMENT may represent a portion of an IDS entity 
or an entire entity. SEGHENTS may be of different types depending upon 


the information they represent. For example, a record representing an 





IDS action would be called an ACT type SEGMENT. For each type of SEGMENT 
in the system there is a record called a SEGMENT TYPE record which contains 
information about that type of SEGMENT. For instance, a SEGMENT type 
record has an attribute which points to a character string which is the 
name of the SEGMENT TYPE (e.g. ACT). Another attribute possessed by a 
SEGMENT TYPE record is a list of encoding rules which begin with SEGMENTS 


of this type. 


C. NLP ENCODING 

The process of translation from the IDS to the target output language 
is called encoding. Encoding consists of two phases. The first phase 
involves the preprocessing of the encoding rules in a compilation-like 
step. In the second phase the IDS is used as input data to the processed 
rules in an execution-like step. The structure generated in the first 
phase is similar in many respects to tie object fioduie produced as a result 
of a FORTRAN compilation. The structure may be saved and used to translate 
as many different IDS's as desired. Although the appearance of encoding 
rules for different target languages may vary considerably, the format is 
identical for them all. Appendices B and D are listings of the sets of 


rules developed in this research. 


D. NLP ENCODING RULE FORMAT 


All encoding rules follow the format: 


SEGMENT TYPE (condition 1, condition 2, ....) ~-> 
SEGMENT TYPE (action 1, action 2, ....) 
SEGMENT TYPE (action 1, action 2, ....) 





For a particular IDS SEGMENT, rule scan begins at the head of the list of 
rules for that SEGMENT TYPE and continues until the conditions existing in 
that SEGMENT match the conditions of a rule on the list, or until the 
default is taken due to failure to find any applicable rule. The default 
option simply causes the EBCDIC string in the ANMS attribute of the SUP 
attribute of the SEGMENT to be output. Once one applicable rule is found, 
the remaining rules are ignored. The conditions may range from specifying 
only that an attribute be present in the SEGMENT, to requiring that the 
attribute possess a certain value. A rule may have several conditions or 
none at all (in which case there would exist only one rule of that SEGMENT 
le Se 

After an applicable rule is found, new SEGMENTS are created as dictated 
by the portion of the rule following the conversion symbol (~~}>). The 
newly created SEGMENT(S) may have the same SEGMENT TYPF as the old SEGMENT 
or a new SEGMENT TYPE. In either situation the new SEGMENT(S) may possess 
any or all of the attributes of the old SEGMENT and may, in addition, have 
new attributes with specified values. Appendix A of Ref. 8 gives a com- 
plete BNF description of the encoding rules, and Appendix B of Ref. 8 
contains the encoding rule symbology and further explains the encoding 


rule format. 


FE. THE IDS ENCODING PROCESS 

Once the encoding rules have been "compiled," the "execution" phase 
may begin. The processing of the IDS is accomplished through the operation 
of a dual suslnedana stack. One side of the stack contains SP's, pointers 
to SEGMENT records, and the other side contains STP's, pointers to SEGMENT 


TYPE records. The basic cycle of stack operation begins by popping off 
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the top SP and STP. The SEGMENT TYPE record is obtained through the STP 
and is examined. If it 1s a terminal SEGMENT TYPE record, then the value 
of the ANMS attribute, alg) A SeMot the SEGMENT TYPE is written out. If 
the SEGMENT TYPE record is not a terminal SEGMENT TYPE record, the ANMS 
attribute is examined to see if it has the value "OUTPUT." If it does, 
the SEGMENT pointed to by the SP is accessed. The action then taken 
depends upon the attributes of the "QUTPUT” SEGMENT. The possible actions 
are: 

1. Skipping lines in the written output 

2. Shifting the printer to a desired column 
Printing an EBCDIC string 


Printing an integer number 


na SP WwW 


Printing a decimal number 
If the "OUTPUT" SEGMENT has no attributes, no outnut overation is performed. 
Upon failure of the first two examinations, the list of rules pointed 
to by an attribute of the SEGMENT TYPE record i obtained, and the list is 
Scanned, rule by rule. The SEGMENT pointed to by the SP is tested to see 
if it satisfies the conditions of the rule currently being considered. If 
the SEGMENT does not completely satisfy any of the rules, then tne cea 
option is taken; that is, the record pointed to by the SUP attribute of 
the SEGMENT is accessed and the EBCDIC string pointed to by the ANMS 
attribute is written out. If a rule is satisfied, then rule scan stops 
and the actions specified by the right part of the rule are taken. SP's 
and STP's for newly created SEGMENTS are placed on the stack in the 
inverse order of their creation; that is, bottom first and top last. 
Finally, the cycle is commieced by erasing the SP and the STP which were 


popped off and erasing the SEGMENT pointed to by the SP. 


1] 





To begin the processing of the IDS, an initial STP is placed on the 
stack with a null SP. The initial SEGMENT TYPE always has one rule with 
no conditions, and therefore it is always satisfied. The basis cycle of 
stack operation is repeated until the stack is empty, at which time pro- 
cessing of the IDS is complete. Since SEGMENTS created during processing 
contain only copies of portions of the IDS or pointers to IDS records, the 
erasure of SEGMENTS does not affect the IDS. Records in the IDS may be 
altered by accessing them through the MEMORY record. The MEMORY record is 
a unique record which contains pointers to the important records in the 
IDS and other attributes which are used for counting and storing numbers. 
Any reference in the Encoding rules to either MEM or MEMORY provides direct 
access to this record. A complete graphical presentation of the operation 
of the dual push-down stack and the resulting output is illustrated in 


Figuees i3 througn :rwetf Ret. &. 
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TIT. EXTENSION OF GPSS ENCODING SYSTEM 


This section deals with the extensions and modifications to GES: A 
Data-Structure-to-GPSS Encoding System developed by Hansen [8]. The 
primary goal of this portion of the work was to make GES more useful to 
the user with little background in GPSS, and therefore the emphasis was 
in improving the appearance and readability of the GPSS program and its 
resulting output. As the work progressed, several other changes were 
made to GES in the interest of extending the system capability and in- 
creasing the visibility of the flow in the rule processing. However, all 
of these changes were just modifications to or extensions of the already 
sound structure of GES, and therefore the resulting system has been laballed 


XGES: An Extended GPSS Encoding Syste. 


A. IMPROVED OUTPUT APPEARANCE 

The solution to the somewhat cryptic appearance of the GPSS program 
and its output was a two-step process. Investigation of GPSS documentation 
[9] revealed that the addition of EQU cards to the GPSS program could in- 
crease the readability of both the GPSS program and the program's output. 
By simply equating an entity name, e.g. ship, and its internal identifi- 
cation number, the GPSS assembler would replace all occurrences of the 
entity identification number with the entity name in both the GPSS program 
and GPSS output. The second step in the process involved adding the 
additional attribute, IDNAME, to the list of attributes of an entity. 
The encoding rules to produce the EQU card were written so that if the 


user provided an IDNAME in his description of the problem, then an &QU 
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card would be output. Otherwise, no EQU card would be produced and the 
program appearance would remain as it was produced by GES. Figure 1 is a 
description of a harbor facility queuing problem. Figures 2 through 4 
illustrate the direct input version of the IDS, the GPSS program, and the 
output from the GPSS program when this additional feature is used. A 

direct comparison of Figures 1] and 2 may be made with Figures 19 and 20 

of Ref. 8 for the same problem. A final minor change made to improve the 
appearance was to alter the GES encoding rules to output the normal and 
exponential function definitions only when required, rather than arbitrarily 


producing them for every problem. 
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EXAMPLE PROBLEM 


There iS a port containing a harbor, 3 docks, 2 piers, a depot, 
and a barge. Ships arrive at the port with an interarrival time of 
5 hours, uniformly distributed, with a range of + 1 hour. 50% of the 
ships are blue ships, 30% are red, and 20% are green. After a ship 
arrives at the port, it unloads cargo at ary available dock. Each dock 
has a capacity of |] unit. Each ship takes up 1] unit of capacity. Un- 
loading time at the dock is normally distributed as follows: 

blue ship - mean of 5 hours, std dev of 1.5 hours 

red ship - mean of 4 nours, std dev of 1.0 hours 

green ship - mean of 3 hours, std dev of .5 hours 
After unloading’ at a dock, a blue ship unloads cargo at the barge, a red 
ship unloads cargo at the depot, and a green ship unloads cargo at a 
pier. The barge has a capacity of ] unit, a pier has a capacity of |] 
unit, the depot has a capacity of 4 units. Unloading times are as follows: 

barge - 1.5 hours, exponentially distributed 

depot - | hour, exponentially distributed 

pier - | hour, normally distributed, std dev of 15 minutes 
Next, after these latest unloadings, 40% of the ships load cargo at a 
dock, and the remainder wait in the harbor. Dock loading time is 2 hours 
for any ship. After loading cargo at a dock, a ship waits in the harbor. 
A ship waits in the harbor until the barge is unoccupied. After waiting 


in the harbor, a ship leaves the port. The basic time unit is the minute. 


Problem duration is 4 days. 


Figure 1 - Description of Harbor Problem 
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Figure 4 - GPSS Output for Harbor Problem 
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Figure 4 - Continued 





B. ENTITY TRANSIT TIME 

Entity transit time is simply the total time an entity spends in the 
system. In order to gather this statistic, GPSS requires a tabulate card 
to mark an entity's exit from the system (entry is marked automatically) 
and a table card for each different type of entity passing through the 
system. A simple modification to the rules produces the required cards. 
An interesting feature of the rule structure allows the user to provide 
the statistical parameters for the table definition cards or to default 
and let the system provide them. Figures 3 and 4 illustrate an example 
where the user has provided the parameter for one table definition card 


and lets the system provide the parameters for the others. 


GC. SHORTES Fok INGeCHOSEE 

A situation which arises repeatedly in simulation problems requires 
that an arriving customer make a choice of queues. Noriaity tne action 
taken is to queue in the shortest line which provides the desired service. 
The GPSS equivalent is the select block. As no provisions for generation 
of this type of block exists under GES, rules were written to provide the 
capability in XGES. An example of a simulation problem which requires : 
such a choice is the bank problem described in Figure 5. To demonstrate 
the flexibility of the encoding rules, two different IDS's were specified 
in direct format and are shown in Agee 6 and 7. The GPSS output from 
the XGES processing of the IDS's is shown in Figures 8 and 9. Although 
different in appearance, the two GPSS programs produce exactly the same 


results when executed. 
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EXAMPLE PROBLEM 


Customers arrive at a bank in a truly random manner. 
One third of them have commercial accounts, one fourth of them 
have personal accounts, and the rest just have miscellaneous 
business. The bank has three windows for commercial accounts 
and two windows for personal accounts. When a customer with 
a commercial account arrives at the bank, he stands at the 
commercial accounts window which has the shortest line. When 
a customer with a personal account arrives, he stands at that 
personal accounts window which has the shortest line. Any 
other customer, upon arrival, stands at any window that happens 
to have the shortest line. 

The customers are serviced at the windows, one at a time. 
The times to service customers are exponentiaily distributed, 
with means of 5, 4, and 2 minutes, for the commercial accounts, 
personal accounts, and the others, respectively. After being 
serviced, a customer leaves the bank. 

If the mean interarrival time for customers at the bank 1s 
10 minutes, what is the average length of time each type of 
customer is in the bank during a five-hour day, and what percent 


of the time are the windows busy? 


Figure 9 - Description of Bank Problem 
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D. OTHER MODIFICATIONS 
An interesting feature of the GES system is the ability to handle sub- 
Structured entities. The structuring is demonstrated below for the harbor 


and bank problems of Figures | and 5. 


Se CUSTOMERS 
Blue Ships Red Ships Green Ships Commercial Personal Misc. 
WINDOW! 
Commercial Personal 


“| N 


Window 1 Window 2 Window 3 Window 4 Window 5 





eee 





ine lower level entities, e.g. Green Ships, possess all the attributes 
associated with the entity in the level above, e.g. SHIPS. Thus, those 
attributes which are common to all the entities in the structure may be 
specified in the upper jevel. and only those attributes which differentiate 
entities must be specified at the lower level. While GES has the ability 
to detect an entity which is at the sub-structure level, it has no ability 
to detect those entities which head a structure. This ability was incor- 
porated into XGES and was used to avoid outputting unnecessary EQU state- 
ments and table definition cards. 

During the revision of the GES, the encoding was reorganized in order 
to increase the visibility of encoding rule processing logic. The re- 
sulting XGES encoding rules are listed in Appendix B and the attributes 
and named records are shown in Appendix A. Figure 10 is a final example 


of the simple queuing problems which XGES is capable of encodinc. Fiaure 
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Pimomenerdi reer specification 7 the problem and Figure 12 is the 


resulting GPSS program. 


o/ 








EXAMPLE PROBLEM 


Cars arrive at a gas station randomly. On the average, 
one car arrives every five minutes. This gas station has 
one pump at which cars are serviced individually. The mean 
service time is four minutes. The distribution of service 
times is uniform, with times ranging from two to six minutes. 

When a car arrives at the gas station, if there are no 
other cars in line, the car goes to the pump, is serviced, 
and then leaves the gas station. If there are one, two, or 
three cars in the line, the car gets in line to wait for its 
turn at the pump. When its turn comes, it is serviced, and 
then leaves the gas station. If there are four cars in line, 


the arriving car leaves the gas station immediately. 


Figure 10 - Description of Gas Station Problem 
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FIGURE 12 - GPSS REPRESENTATION OF GAS STATION PROBLEM 








IV. ENCODING SYSTEM FOR [RANSLATION 
FROM THE IDS TO ENGLISH 


The decision to develop a system to translate the IDS version of a 
Simulation problem into English was based on the idea that a user who was 
not familiar at all with GPSS would have no means of knowing if his original 
expression of the problem had been translated into GPSS correctly. Such a 
system would be useful regardless of how the IDS was created. Whether the 
IDS was specified directly or was created through a question-answer system 
or was the result of the decoding of natural language text, the re- 


translation into English should help to reveal any “misunderstandings. " 


A. OBJECTIVE AND INITIAL ASSUMPTIONS 

The initiai objective if aeveloping ai IDS-to-E£ngiisn system was to 
create a system capable of translating the IDS into English sentences 
appropriate for expressing simulation problems. The development of the 
system was to be based on the idea of maintaining the greatest flexibility 
and generality possible, so that once the system had been developed ‘to the 
point that it produced simulation problem descriptions in reasonable 
English, only a little additional effort would allow variations in the 
expression of the same problem and, in general, lend some elegance to the 
English produced. The English description would be based on the ENTITY- 
ACTION- Cat a LOCATION) form inherent in the Internal Data Structure [8]. 

The first assumption made was that the system would consist of a set 
of encoding rules somewhat similar to those of XGES. The second assump- 
tion was that if the rules were properly stratified in their developniant, 


then the IDS-to-English encoding system would automatically possess 


4) 





sufficient flexibility and generality to allow easy expansion of the 
system's capabilities. These assumptions were based upon the experience 
gained through the work with expanding the GES and through knowledge of 


the capabilities of the rule language. 


B. RULE DEVELOPMENT 

Once the decision was made to write encoding rules, it was necessary 
to develop the rules to obtain the information from the IDS in some logi- 
cal order. That is, before any English text could be produced, the 
meaning had to be extracted from the IDS in proper semantic sequence for 
translation into English. The vehicle which provided this sequence was 
the action list’ of the MEMORY record. Through the action list all the 
necessary information could be accessed, and the order of the actions on 
the list is a reasonable order for the English expression of the problem. 
The successor attribute of each action record was uSed as aii aaaitional 
means of specifying simulation problem action and was also used to smooth 
the English expression of the problem flow from one action to the next. 

The basic problem in writing the rules was that of ensuring that all 
the necessary information was carried from rule to rule until the actual 
characters were emitted. The delicate point was deciding between what 
information should be carried along to accommodate both present and future 
Semantic capabilities of the oyeeen and what information was excess and 
would only serve to slow the rule processing. 

A previously unused capability of the encoding rules was employea to 
create a SEGMENT record from pieces of an IDS record. For example, it 
was necessary to loop through a PTYP record [8] and copy selected portions 


of three of the six records pointed to and combine them in the proper 
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order into a single SEGMENT record. This new SEGMENT record was used later 
on to output a complex sentence consisting of a series of actions. These 
actions were picked from the SEGMENT record easily since only the desired 
information was present and it was in the proper order. 

Once the encoding rules were developed to extract the meaning from the 
IDS in the proper order for conversion to English sentences, they were 
combined with encoding rules previously constructed to actually output 
properly formed sentences. The resulting list of attributes, indicators 
and named records is shown in Appendix C and the encoding rules are listed 
in Appendix D. The English texts produced by the encoding rules of Appendix 
D for three sample problems are given in Figures 13, 14 and 15. They may 
be compared with the original expressions of the problems shown in Figures 


ieeorand 10. 
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V. CONCLUSIONS 


The results discussed in the previous sections are indicative of the 
powerful translational capabilities inherent in NLP. An especially 
attractive feature of translation from source language to IDS to target 
language is that only the encoding rules need be changed if the desired 
target language is changed. The encoding rules for conversion to GPSS 
and English are not dependent on how the information is decoded from the 
source language into the IDS. 

menouch the IDS and the encoding rules for translation to GPSS and 
English are still expanding as more complicated queuing problems are 
being processed, the basic IDS and rule structure have remained fixed. 

At present some knowledge of how NLP works is still necessary in order 

to write enccding rules. However, with a fixed IDS and rule structure, 
an expanded BNF for encoding rules should be sufficient to allow further 
development of either of the present encoding rule systems or creation of 
a completely new system without knowledge of how the rules are actually 
processed by the FORTRAN program. | 

Pimalty, 1t is recommended that both XGES and the IDS-to-English 
encoding system be coupled with the Question-Answer system for simulation 
problems developed by LCDR E. S. Baker [10] to produce the prototype of 


a system which could be of great practical value. 
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